37 research outputs found

    The Use of Parallel Processing in VLSI Computer-Aided Design Application

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratorySemiconductor Research Corporation / 87-DP-10

    Parallel Processing for VLSI CAD Applications a Tutorial

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratorySemiconductor Research CorporationAuthor's name appears in front matter as Prithviraj Banerje

    Space-Borne Computing for the Year 2000 and Beyond

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Aeronautics and Space Administration (NASA) / NAG-1-61

    Evaluation of scheduling and allocation algorithms while mapping assembly code onto FPGAs

    Get PDF
    ABSTRACT Migration of software from older general purpose embedded processors onto newer mixed hardware/software Systems-On-Chip (SOC) platforms is becoming an increasingly important topic. Automatic translation of general purpose software binaries and assembly code onto hardware implementations using FPGAs require sophisticated scheduling and allocation algorithms to maximize the resource utilization of such hardware devices. This paper describes the effects of scheduling and chaining of node operations in a CDFG onto an FPGA. The effects of register allocation on scheduled nodes are also discussed. The Texas Instruments C6000 DSP processor architecture was chosen as the DSP processor platform and assembly code, and the Xilinx Virtex II XC2V250 was chosen as the target FPGA. Results are reported on ten benchmarks, which show that scheduling with chaining operations produces the best results on FPGAs, while the addition of register allocation in fact generates poorer designs in terms of area and frequency

    A Parallel Branch And Bound Algorithm For Test Generation

    No full text
    For circuits of VLSI complexity, test generation time can be prohibitive. Most of the time is consumed by hard-to-detect (HTD) faults which might remain undetected even after a large number of backtracks. We identify the problems inherent in a uniprocessor implementation of a test generation algorithm and propose a parallel test generation algorithm which tries to achieve a high fault coverage for HTD faults in a reasonable amount of time. A dynamic search space allocation strategy is used which ensures that the search spaces allocated to different processors are disjoint. The parallel test generation algorithm has been implemented on an Intel iPSC/2 hypercube. Results are presented using the ISCAS combinational benchmark circuits which conclusively prove that parallel processing of HTD faults does indeed result in high fault coverage which is otherwise not achievable by a uniprocessor algorithm in limited CPU time. The parallel algorithm exhibits superlinear speedups in some cases due..

    A Matrix-Based Approach to the Global Locality Optimization Problem

    No full text
    Global locality analysis is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout optimizations. Pure loop transformations are restricted by data dependences and may not be very successful in optimizing imperfectly nested loops; the impact of a data transformation on an array might be program-wide. Therefore, in this paper we argue for a combined approach which employs both loop and data transformations. The method enjoys the advantages of the most of the previous techniques for enhancing locality and is efficient. In our approach, the loop nests are processed one by one and the data layout constraints obtained from one nest are propagated for optimization of the remaining loop nests. We show that this process can be put in a simple matrix framework which can be manipulated by an optimizing compiler. The search space that we consider for possible loop transformations comprises general non-singular linear transformation..

    Compile-Time Estimation of Communication Costs of Programs

    No full text
    One of the most challenging problems in compiling for distributed memory machines is to determine how data for a program should be distributed across processors. Any compiler that makes data partitioning decisions needs a mechanism for estimating communication and computational costs of programs to compare different alternatives. This paper presents a methodology for estimating communication costs of programs written in global address space. In this approach, the compiler analyzes programs before generating communication, and yet takes into account important communication optimizations that will be performed. We introduce the notion of traversal properties of array references in loops, that help identify the nature and extent of data movement in terms of high-level communication primitives. This enables the compiler to obtain more precise information about the global state of communication, and in a largely machine-independent manner. The methodology described in this paper has been i..
    corecore